Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 68
Filtrar
1.
bioRxiv ; 2024 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-38617337

RESUMO

Principal component analysis (PCA) is widely used to control for population structure in genome-wide association studies (GWAS). Top principal components (PCs) typically reflect population structure, but challenges arise in deciding how many PCs are needed and ensuring that PCs do not capture other artifacts such as regions with atypical linkage disequilibrium (LD). In response to the latter, many groups suggest performing LD pruning or excluding known high LD regions prior to PCA. However, these suggestions are not universally implemented and the implications for GWAS are not fully understood, especially in the context of admixed populations. In this paper, we investigate the impact of pre-processing and the number of PCs included in GWAS models in African American samples from the Women's Women's Health Initiative SNP Health Association Resource and two Trans-Omics for Precision Medicine Whole Genome Sequencing Project contributing studies (Jackson Heart Study and Genetic Epidemiology of Chronic Obstructive Pulmonary Disease Study). In all three samples, we find the first PC is highly correlated with genome-wide ancestry whereas later PCs often capture local genomic features. The pattern of which, and how many, genetic variants are highly correlated with individual PCs differs from what has been observed in prior studies focused on European populations and leads to distinct downstream consequences: adjusting for such PCs yields biased effect size estimates and elevated rates of spurious associations due to the phenomenon of collider bias. Excluding high LD regions identified in previous studies does not resolve these issues. LD pruning proves more effective, but the optimal choice of thresholds varies across datasets. Altogether, our work highlights unique issues that arise when using PCA to control for ancestral heterogeneity in admixed populations and demonstrates the importance of careful pre-processing and diagnostics to ensure that PCs capturing multiple local genomic features are not included in GWAS models.

2.
Am J Hum Genet ; 111(4): 691-700, 2024 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-38513668

RESUMO

We present a method for efficiently identifying clusters of identical-by-descent haplotypes in biobank-scale sequence data. Our multi-individual approach enables much more computationally efficient inference of identity by descent (IBD) than approaches that infer pairwise IBD segments and provides locus-specific IBD clusters rather than IBD segments. Our method's computation time, memory requirements, and output size scale linearly with the number of individuals in the dataset. We also present a method for using multi-individual IBD to detect alleles changed by gene conversion. Application of our methods to the autosomal sequence data for 125,361 White British individuals in the UK Biobank detects more than 9 million converted alleles. This is 2,900 times more alleles changed by gene conversion than were detected in a previous analysis of familial data. We estimate that more than 250,000 sequenced probands and a much larger number of additional genomes from multi-generational family members would be required to find a similar number of alleles changed by gene conversion using a family-based approach. Our IBD clustering method is implemented in the open-source ibd-cluster software package.


Assuntos
Bancos de Espécimes Biológicos , Conversão Gênica , Humanos , Software , Haplótipos/genética , Cromossomos , Polimorfismo de Nucleotídeo Único
3.
bioRxiv ; 2023 Nov 05.
Artigo em Inglês | MEDLINE | ID: mdl-37961601

RESUMO

We present a method for efficiently identifying clusters of identical-by-descent haplotypes in biobank-scale sequence data. Our multi-individual approach enables much more efficient collection and storage of identity by descent (IBD) information than approaches that detect and store pairwise IBD segments. Our method's computation time, memory requirements, and output size scale linearly with the number of individuals in the dataset. We also present a method for using multi-individual IBD to detect alleles changed by gene conversion. Application of our methods to the autosomal sequence data for 125,361 White British individuals in the UK Biobank detects more than 9 million converted alleles. This is 2900 times more alleles changed by gene conversion than were detected in a previous analysis of familial data. We estimate that more than 250,000 sequenced probands and a much larger number of additional genomes from multi-generational family members would be required to find a similar number of alleles changed by gene conversion using a family-based approach.

4.
Genome Med ; 15(1): 52, 2023 07 17.
Artigo em Inglês | MEDLINE | ID: mdl-37461045

RESUMO

BACKGROUND: Metabolic pathways are related to physiological functions and disease states and are influenced by genetic variation and environmental factors. Hispanics/Latino individuals have ancestry-derived genomic regions (local ancestry) from their recent admixture that have been less characterized for associations with metabolite abundance and disease risk. METHODS: We performed admixture mapping of 640 circulating metabolites in 3887 Hispanic/Latino individuals from the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). Metabolites were quantified in fasting serum through non-targeted mass spectrometry (MS) analysis using ultra-performance liquid chromatography-MS/MS. Replication was performed in 1856 nonoverlapping HCHS/SOL participants with metabolomic data. RESULTS: By leveraging local ancestry, this study identified significant ancestry-enriched associations for 78 circulating metabolites at 484 independent regions, including 116 novel metabolite-genomic region associations that replicated in an independent sample. Among the main findings, we identified Native American enriched genomic regions at chromosomes 11 and 15, mapping to FADS1/FADS2 and LIPC, respectively, associated with reduced long-chain polyunsaturated fatty acid metabolites implicated in metabolic and inflammatory pathways. An African-derived genomic region at chromosome 2 was associated with N-acetylated amino acid metabolites. This region, mapped to ALMS1, is associated with chronic kidney disease, a disease that disproportionately burdens individuals of African descent. CONCLUSIONS: Our findings provide important insights into differences in metabolite quantities related to ancestry in admixed populations including metabolites related to regulation of lipid polyunsaturated fatty acids and N-acetylated amino acids, which may have implications for common diseases in populations.


Assuntos
Estudo de Associação Genômica Ampla , Hispânico ou Latino , Espectrometria de Massas em Tandem , Humanos , População Negra/genética , Genoma Humano , Estudo de Associação Genômica Ampla/métodos , Hispânico ou Latino/genética , Polimorfismo de Nucleotídeo Único , Indígena Americano ou Nativo do Alasca/genética , Metabolismo/genética , Grupos Populacionais/etnologia , Grupos Populacionais/genética
5.
G3 (Bethesda) ; 13(10)2023 09 30.
Artigo em Inglês | MEDLINE | ID: mdl-37497617

RESUMO

The effective size of a population (Ne) in the recent past can be estimated through analysis of identity-by-descent (IBD) segments. Several methods have been developed for estimating Ne from autosomal IBD segments, but no such effort has been made with X chromosome IBD segments. In this work, we propose a method to estimate the X chromosome effective population size from X chromosome IBD segments. We show how to use the estimated autosome Ne and X chromosome Ne to estimate the female and male effective population sizes. We demonstrate the accuracy of our autosome and X chromosome Ne estimation with simulated data. We find that the estimated female and male effective population sizes generally reflect the simulated sex-specific effective population sizes across the past 100 generations but that short-term differences between the estimated sex-specific Ne across tens of generations may not reliably indicate true sex-specific differences. We analyzed the effective size of populations represented by samples of sequenced UK White British and UK Indian individuals from the UK Biobank.


Assuntos
Genética Populacional , Cromossomo X , Humanos , Masculino , Feminino , Densidade Demográfica
6.
HGG Adv ; 4(3): 100207, 2023 07 13.
Artigo em Inglês | MEDLINE | ID: mdl-37333771

RESUMO

Alzheimer disease (AD) is the most common form of senile dementia, with high incidence late in life in many populations including Caribbean Hispanic (CH) populations. Such admixed populations, descended from more than one ancestral population, can present challenges for genetic studies, including limited sample sizes and unique analytical constraints. Therefore, CH populations and other admixed populations have not been well represented in studies of AD, and much of the genetic variation contributing to AD risk in these populations remains unknown. Here, we conduct genome-wide analysis of AD in multiplex CH families from the Alzheimer Disease Sequencing Project (ADSP). We developed, validated, and applied an implementation of a logistic mixed model for admixture mapping with binary traits that leverages genetic ancestry to identify ancestry-of-origin loci contributing to AD. We identified three loci on chromosome 13q33.3 associated with reduced risk of AD, where associations were driven by Native American (NAM) ancestry. This AD admixture mapping signal spans the FAM155A, ABHD13, TNFSF13B, LIG4, and MYO16 genes and was supported by evidence for association in an independent sample from the Alzheimer's Genetics in Argentina-Alzheimer Argentina consortium (AGA-ALZAR) study with considerable NAM ancestry. We also provide evidence of NAM haplotypes and key variants within 13q33.3 that segregate with AD in the ADSP whole-genome sequencing data. Interestingly, the widely used genome-wide association study approach failed to identify associations in this region. Our findings underscore the potential of leveraging genetic ancestry diversity in recently admixed populations to improve genetic mapping, in this case for AD-relevant loci.


Assuntos
Doença de Alzheimer , Humanos , Doença de Alzheimer/genética , Estudo de Associação Genômica Ampla , Hispânico ou Latino/genética , Loci Gênicos/genética , Etnicidade
7.
Am J Hum Genet ; 110(2): 326-335, 2023 02 02.
Artigo em Inglês | MEDLINE | ID: mdl-36610402

RESUMO

Local ancestry is the source ancestry at each point in the genome of an admixed individual. Inferred local ancestry is used for admixture mapping and population genetic analyses. We present FLARE (fast local ancestry estimation), a method for local ancestry inference. FLARE achieves high accuracy through the use of an extended Li and Stephens model, and it achieves exceptional computational performance through incorporation of computational techniques developed for genotype imputation. Memory requirements are reduced through on-the-fly compression of reference haplotypes and stored checkpoints. Computation time is reduced through the use of composite reference haplotypes. These techniques allow FLARE to scale to datasets with hundreds of thousands of sequenced individuals and to provide superior accuracy on large-scale data. FLARE is open source and available at https://github.com/browning-lab/flare.


Assuntos
Genética Populacional , Genoma Humano , Humanos , Etnicidade , Genótipo , Haplótipos/genética
8.
Am J Hum Genet ; 110(1): 161-165, 2023 01 05.
Artigo em Inglês | MEDLINE | ID: mdl-36450278

RESUMO

The first release of UK Biobank whole-genome sequence data contains 150,119 genomes. We present an open-source pipeline for filtering, phasing, and indexing these genomes on the cloud-based UK Biobank Research Analysis Platform. This pipeline makes it possible to apply haplotype-based methods to UK Biobank whole-genome sequence data. The pipeline uses BCFtools for marker filtering, Beagle for genotype phasing, and Tabix for VCF indexing. We used the pipeline to phase 406 million single-nucleotide variants on chromosomes 1-22 and X at a cost of £2,309. The maximum time required to process a chromosome was 2.6 days. In order to assess phase accuracy, we modified the pipeline to exclude trio parents. We observed a switch error rate of 0.0016 on chromosome 20 in the White British trio offspring. If we exclude markers with nonmajor allele frequency < 0.1% after phasing, this switch error rate decreases by 80% to 0.00032.


Assuntos
Bancos de Espécimes Biológicos , Genoma , Humanos , Cães , Animais , Genótipo , Haplótipos/genética , Polimorfismo de Nucleotídeo Único/genética , Reino Unido , Algoritmos , Análise de Sequência de DNA/métodos
9.
Am J Hum Genet ; 109(12): 2178-2184, 2022 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-36370709

RESUMO

We provide a method for estimating the genome-wide mutation rate from sequence data on unrelated individuals by using segments of identity by descent (IBD). The length of an IBD segment indicates the time to shared ancestor of the segment, and mutations that have occurred since the shared ancestor result in discordances between the two IBD haplotypes. Previous methods for IBD-based estimation of mutation rate have required the use of family data for accurate phasing of the genotypes. This has limited the scope of application of IBD-based mutation rate estimation. Here, we develop an IBD-based method for mutation rate estimation from population data, and we apply it to whole-genome sequence data on 4,166 European American individuals from the TOPMed Framingham Heart Study, 2,996 European American individuals from the TOPMed My Life, Our Future study, and 1,586 African American individuals from the TOPMed Hypertension Genetic Epidemiology Network study. Although mutation rates may differ between populations as a result of genetic factors, demographic factors such as average parental age, and environmental exposures, our results are consistent with equal genome-wide average mutation rates across these three populations. Our overall estimate of the average genome-wide mutation rate per 108 base pairs per generation for single-nucleotide variants is 1.24 (95% CI 1.18-1.33).


Assuntos
Genoma Humano , Taxa de Mutação , Humanos , Genoma Humano/genética , Polimorfismo de Nucleotídeo Único/genética , Haplótipos , Genótipo
10.
Am J Hum Genet ; 109(6): 1016-1025, 2022 06 02.
Artigo em Inglês | MEDLINE | ID: mdl-35659928

RESUMO

Haplotypes can be estimated from unphased genotype data via statistical methods. When parent-offspring trios are available for inferring the true phase from Mendelian inheritance rules, the accuracy of statistical phasing is usually measured by the switch error rate, which is the proportion of pairs of consecutive heterozygotes that are incorrectly phased. We present a method for estimating the genotype error rate from parent-offspring trios and a method for estimating the bias that occurs in the observed switch error rate as a result of genotype error. We apply these methods to 485,301 genotyped UK Biobank samples that include 898 White British trios and to 38,387 sequenced TOPMed samples that include 217 African Caribbean trios and 669 European American trios. We show that genotype error inflates the observed switch error rate and that the relative bias increases with sample size. For the UK Biobank White British trios, the observed switch error rate in the trio offspring is 2.4 times larger than the estimated true switch error rate (1.4 × 10-3 vs 5.8 × 10-4. We propose an alternate definition of phase error that counts two consecutive switch errors as a single error because back-to-back switch errors arise when a single heterozygote is incorrectly phased with respect to the surrounding heterozygotes. With this definition, we estimate that the average distance between phase errors is 64 megabases in the UK Biobank White British individuals.


Assuntos
Hereditariedade , Polimorfismo de Nucleotídeo Único , Viés , Genótipo , Haplótipos/genética , Humanos , Polimorfismo de Nucleotídeo Único/genética
11.
HGG Adv ; 3(2): 100096, 2022 Apr 14.
Artigo em Inglês | MEDLINE | ID: mdl-35300209

RESUMO

Allele frequency estimates in admixed populations, such as Hispanics and Latinos, rely on the sample's specific admixture composition and thus may differ between two seemingly similar populations. However, ancestry-specific allele frequencies, i.e., pertaining to the ancestral populations of an admixed group, may be particularly useful for prioritizing genetic variants for genetic discovery and personalized genomic health. We developed a method, ancestry-specific allele frequency estimation in admixed populations (AFA), to estimate the frequencies of biallelic variants in admixed populations with an unlimited number of ancestries. AFA uses maximum-likelihood estimation by modeling the conditional probability of having an allele given proportions of genetic ancestries. It can be applied using either local ancestry interval proportions encompassing the variant (local-ancestry-specific allele frequency estimations in admixed populations [LAFAs]) or global proportions of genetic ancestries (global-ancestry-specific allele frequency estimations in admixed populations [GAFAs]), which are easier to compute and are more widely available. Simulations and comparisons to existing software demonstrated the high accuracy of the method. We implemented AFA on high-quality imputed data of ∼9,000 Hispanics and Latinos from the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), an understudied, admixed population with three predominant continental ancestries: Amerindian, European, and African. Comparison of the European and African estimated frequencies to the respective gnomAD frequencies demonstrated high correlations (Pearson R2 = 0.97-0.99). We provide a genome-wide dataset of the estimated ancestry-specific allele frequencies for available variants with allele frequency between 5% and 95% in at least one of the three ancestral populations. Association analysis of Amerindian-enriched variants with cardiometabolic traits identified five loci associated with lipid traits in Hispanics and Latinos, demonstrating the utility of ancestry-specific allele frequencies in admixed populations.

12.
Genome Med ; 14(1): 27, 2022 03 08.
Artigo em Inglês | MEDLINE | ID: mdl-35260199

RESUMO

BACKGROUND: The Colombian population, as well as those in other Latin American regions, arose from a recent tri-continental admixture among Native Americans, Spanish invaders, and enslaved Africans, all of whom passed through a population bottleneck due to widespread infectious diseases that left small isolated local settlements. As a result, the current population reflects multiple founder effects derived from diverse ancestries. METHODS: We characterized the role of admixture and founder effects on the origination of the mutational landscape that led to neurodegenerative disorders under these historical circumstances. Genomes from 900 Colombian individuals with Alzheimer's disease (AD) [n = 376], frontotemporal lobar degeneration-motor neuron disease continuum (FTLD-MND) [n = 197], early-onset dementia not otherwise specified (EOD) [n = 73], and healthy participants [n = 254] were analyzed. We examined their global and local ancestry proportions and screened this cohort for deleterious variants in disease-causing and risk-conferring genes. RESULTS: We identified 21 pathogenic variants in AD-FTLD related genes, and PSEN1 harbored the majority (11 pathogenic variants). Variants were identified from all three continental ancestries. TREM2 heterozygous and homozygous variants were the most common among AD risk genes (102 carriers), a point of interest because the disease risk conferred by these variants differed according to ancestry. Several gene variants that have a known association with MND in European populations had FTLD phenotypes on a Native American haplotype. Consistent with founder effects, identity by descent among carriers of the same variant was frequent. CONCLUSIONS: Colombian demography with multiple mini-bottlenecks probably enhanced the detection of founder events and left a proportionally higher frequency of rare variants derived from the ancestral populations. These findings demonstrate the role of genomically defined ancestry in phenotypic disease expression, a phenotypic range of different rare mutations in the same gene, and further emphasize the importance of inclusiveness in genetic studies.


Assuntos
Doença de Alzheimer , Degeneração Lobar Frontotemporal , Doenças Neurodegenerativas , Doença de Alzheimer/genética , Colômbia , Efeito Fundador , Degeneração Lobar Frontotemporal/genética , Humanos , Mutação , Doenças Neurodegenerativas/genética
13.
Am J Hum Genet ; 108(10): 1880-1890, 2021 10 07.
Artigo em Inglês | MEDLINE | ID: mdl-34478634

RESUMO

Haplotype phasing is the estimation of haplotypes from genotype data. We present a fast, accurate, and memory-efficient haplotype phasing method that scales to large-scale SNP array and sequence data. The method uses marker windowing and composite reference haplotypes to reduce memory usage and computation time. It incorporates a progressive phasing algorithm that identifies confidently phased heterozygotes in each iteration and fixes the phase of these heterozygotes in subsequent iterations. For data with many low-frequency variants, such as whole-genome sequence data, the method employs a two-stage phasing algorithm that phases high-frequency markers via progressive phasing in the first stage and phases low-frequency markers via genotype imputation in the second stage. This haplotype phasing method is implemented in the open-source Beagle 5.2 software package. We compare Beagle 5.2 and SHAPEIT 4.2.1 by using expanding subsets of 485,301 UK Biobank samples and 38,387 TOPMed samples. Both methods have very similar accuracy and computation time for UK Biobank SNP array data. However, for TOPMed sequence data, Beagle is more than 20 times faster than SHAPEIT, achieves similar accuracy, and scales to larger sample sizes.


Assuntos
Asma/genética , Fibrilação Atrial/genética , Interpretação Estatística de Dados , Genoma Humano , Haplótipos , Polimorfismo de Nucleotídeo Único , Software , Algoritmos , Feminino , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Masculino
14.
STAR Protoc ; 2(2): 100550, 2021 06 18.
Artigo em Inglês | MEDLINE | ID: mdl-34095864

RESUMO

The SPrime program detects the variants in current-day populations that were introgressed from an archaic source in the past. It is optimized for detecting introgression from Neanderthals and Denisovans in modern humans. We provide a protocol for detecting Neanderthal and Denisovan introgression in 1000 Genomes Project data, specifically focusing on the CHB (Han Chinese in Beijing) population. For complete details on the use and execution of this protocol, please refer to Browning et al. (2018).


Assuntos
Introgressão Genética/genética , Genômica/métodos , Homem de Neandertal/genética , Animais , DNA Antigo/análise , Hominidae/genética , Humanos
15.
Am J Hum Genet ; 107(5): 895-910, 2020 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-33053335

RESUMO

Most methods for fast detection of identity by descent (IBD) segments report identity by state segments without any quantification of the uncertainty in the endpoints and lengths of the IBD segments. We present a method for determining the posterior probability distribution of IBD segment endpoints. Our approach accounts for genotype errors, recent mutations, and gene conversions which disrupt DNA sequence identity within IBD segments, and it can be applied to large cohorts with whole-genome sequence or SNP array data. We find that our method's estimates of uncertainty are well calibrated for homogeneous samples. We quantify endpoint uncertainty for 77.7 billion IBD segments from 408,883 individuals of white British ancestry in the UK Biobank, and we use these IBD segments to find regions showing evidence of recent natural selection. We show that many spurious selection signals are eliminated by the use of unbiased estimates of IBD segment endpoints and a pedigree-based genetic map. Eleven of the twelve regions with the greatest evidence for recent selection in our scan have been identified as selected in previous analyses using different approaches. Our computationally efficient method for quantifying IBD segment endpoint uncertainty is implemented in the open source ibd-ends software package.


Assuntos
Identificação Biométrica/métodos , Mapeamento Cromossômico/estatística & dados numéricos , Genoma Humano , Padrões de Herança , Modelos Estatísticos , Polimorfismo de Nucleotídeo Único , Bancos de Espécimes Biológicos , Família , Feminino , Estudo de Associação Genômica Ampla , Humanos , Masculino , Linhagem , Software , Incerteza , Reino Unido
16.
Bioinformatics ; 36(16): 4519-4520, 2020 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-32844204

RESUMO

MOTIVATION: Estimation of pairwise kinship coefficients in large datasets is computationally challenging because the number of related individuals increases quadratically with sample size. RESULTS: We present IBDkin, a software package written in C for estimating kinship coefficients from identity by descent (IBD) segments. We use IBDkin to estimate kinship coefficients for 7.95 billion pairs of individuals in the UK Biobank who share at least one detected IBD segment with length ≥ 4 cM. AVAILABILITY AND IMPLEMENTATION: https://github.com/YingZhou001/IBDkin. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Humanos
17.
Am J Hum Genet ; 107(1): 137-148, 2020 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-32533945

RESUMO

Recombination rates vary significantly across the genome, and estimates of recombination rates are needed for downstream analyses such as haplotype phasing and genotype imputation. Existing methods for recombination rate estimation are limited by insufficient amounts of informative genetic data or by high computational cost. We present a method and software, called IBDrecomb, for using segments of identity by descent to infer recombination rates. IBDrecomb can be applied to sequenced population cohorts to obtain high-resolution, population-specific recombination maps. In simulated admixed data, IBDrecomb obtains higher accuracy than admixture-based estimation of recombination rates. When applied to 2,500 simulated individuals, IBDrecomb obtains similar accuracy to a linkage-disequilibrium (LD)-based method applied to 96 individuals (the largest number for which computation is tractable). Compared to LD-based maps, our IBD-based maps have the advantage of estimating recombination rates in the recent past rather than the distant past. We used IBDrecomb to generate new recombination maps for European Americans and for African Americans from TOPMed sequence data from the Framingham Heart Study (1,626 unrelated individuals) and the Jackson Heart Study (2,046 unrelated individuals), and we compare them to LD-based, admixture-based, and family-based maps.


Assuntos
Recombinação Genética/genética , Negro ou Afro-Americano/genética , Genética Populacional/métodos , Genoma Humano/genética , Haplótipos/genética , Humanos , Desequilíbrio de Ligação/genética , Polimorfismo de Nucleotídeo Único/genética , População Branca/genética
18.
Am J Hum Genet ; 106(4): 426-437, 2020 04 02.
Artigo em Inglês | MEDLINE | ID: mdl-32169169

RESUMO

Segments of identity by descent (IBD) are used in many genetic analyses. We present a method for detecting identical-by-descent haplotype segments in phased genotype data. Our method, called hap-IBD, combines a compressed representation of haplotype data, the positional Burrows-Wheeler transform, and multi-threaded execution to produce very fast analysis times. An attractive feature of hap-IBD is its simplicity: the input parameters clearly and precisely define the IBD segments that are reported, so that program correctness can be confirmed by users. We evaluate hap-IBD and four state-of-the-art IBD segment detection methods (GERMLINE, iLASH, RaPID, and TRUFFLE) using UK Biobank chromosome 20 data and simulated sequence data. We show that hap-IBD detects IBD segments faster and more accurately than competing methods, and that hap-IBD is the only method that can rapidly and accurately detect short 2-4 centiMorgan (cM) IBD segments in the full UK Biobank data. Analysis of 485,346 UK Biobank samples through the use of hap-IBD with 12 computational threads detects 231.5 billion autosomal IBD segments with length ≥2 cM in 24.4 h.


Assuntos
Genoma Humano/genética , Análise de Sequência de DNA/métodos , Alelos , Cromossomos/genética , Simulação por Computador , Análise de Dados , Marcadores Genéticos/genética , Genética Populacional/métodos , Genótipo , Haplótipos/genética , Humanos , Polimorfismo de Nucleotídeo Único/genética , Software
19.
Proc Natl Acad Sci U S A ; 117(5): 2560-2569, 2020 02 04.
Artigo em Inglês | MEDLINE | ID: mdl-31964835

RESUMO

De novo mutations (DNMs), or mutations that appear in an individual despite not being seen in their parents, are an important source of genetic variation whose impact is relevant to studies of human evolution, genetics, and disease. Utilizing high-coverage whole-genome sequencing data as part of the Trans-Omics for Precision Medicine (TOPMed) Program, we called 93,325 single-nucleotide DNMs across 1,465 trios from an array of diverse human populations, and used them to directly estimate and analyze DNM counts, rates, and spectra. We find a significant positive correlation between local recombination rate and local DNM rate, and that DNM rate explains a substantial portion (8.98 to 34.92%, depending on the model) of the genome-wide variation in population-level genetic variation from 41K unrelated TOPMed samples. Genome-wide heterozygosity does correlate with DNM rate, but only explains <1% of variation. While we are underpowered to see small differences, we do not find significant differences in DNM rate between individuals of European, African, and Latino ancestry, nor across ancestrally distinct segments within admixed individuals. However, we did find significantly fewer DNMs in Amish individuals, even when compared with other Europeans, and even after accounting for parental age and sequencing center. Specifically, we found significant reductions in the number of C→A and T→C mutations in the Amish, which seem to underpin their overall reduction in DNMs. Finally, we calculated near-zero estimates of narrow sense heritability (h2), which suggest that variation in DNM rate is significantly shaped by nonadditive genetic effects and the environment.


Assuntos
Amish/genética , Genoma Humano , Adulto , Estudos de Coortes , Análise Mutacional de DNA , Feminino , Genética Populacional , Heterozigoto , Humanos , Masculino , Mutação , Linhagem , Sequenciamento Completo do Genoma , Adulto Jovem
20.
Am J Hum Genet ; 105(5): 883-893, 2019 11 07.
Artigo em Inglês | MEDLINE | ID: mdl-31587867

RESUMO

The two primary methods for estimating the genome-wide mutation rate have been counting de novo mutations in parent-offspring trios and comparing sequence data between closely related species. With parent-offspring trio analysis it is difficult to control for genotype error, and resolution is limited because each trio provides information from only two meioses. Inter-species comparison is difficult to calibrate due to uncertainty in the number of meioses separating species, and it can be biased by selection and by changing mutation rates over time. An alternative class of approaches for estimating mutation rates that avoids these limitations is based on identity by descent (IBD) segments that arise from common ancestry within the past few thousand years. Existing IBD-based methods are limited to highly inbred samples, or lack robustness to genotype error and error in the estimated demographic history. We present an IBD-based method that uses sharing of IBD segments among sets of three individuals to estimate the mutation rate. Our method is applicable to accurately phased genotype data, such as parent-offspring trio data phased using Mendelian rules of inheritance. Unlike standard parent-offspring analysis, our method utilizes distant relationships and is robust to genotype error. We apply our method to data from 1,307 European-ancestry individuals in the Framingham Heart Study sequenced by the NHLBI TOPMed project. We obtain an estimate of 1.29 × 10-8 mutations per base pair per meiosis with a 95% confidence interval of [1.02 × 10-8, 1.56 × 10-8].


Assuntos
Genoma Humano/genética , Mutação/genética , Genótipo , Hereditariedade/genética , Humanos , Meiose/genética , Taxa de Mutação , Linhagem , Polimorfismo de Nucleotídeo Único/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...